Davin Dillon
2022-04-18
I used alexsychu kaggle to gather some more data up to 2020. This second dat set had an additional 7,668 rows of data. This user was trying to answer the question of how to predict if a movie will do well.
Lastly for movie data, I used data from unanimad kaggle . This data set had 10,395 rows of data and this kaggle user asked various questions about who won Oscars.
There were many renames and some work to choose a movie budget in some cases. Once I had the columns I wanted, I set in on joining the data together. First, I joined the meta data with the budget data to include all of the titles for which I had information. My next adventure (or misadventure) was to join this data with the inflation data. Once that was done, all that was left was joining this information with the Oscar data. All in all, I used three full joins.
oscars <- oscars %>% # rename year for join
rename('year' = 'year_film')
meta$year <- format(as.Date(meta$release_date, format = "%m/%d/%Y"), "%Y")
blue = '#000080'
meta <- meta %>%
mutate(year = as.numeric(year))
# renames
budget <- budget %>%
rename(vote_average = score) %>% # renaming vote data for joins etc
rename(vote_count = votes) %>%
rename(Title = 'Movie Title')
oscars <- oscars %>%
rename(Title = 'film') # rename for joins
adj <- inflation %>%
mutate(multiplier = (22.82/amount))
# create multiplier column for easy calculations
budget <- budget %>%
rename(new_budget = Budget) # rename for joins etc
options(scipen = 100) # avoid scientific notation
full_budget <- full_join(meta, budget, on = 'Title') ## Joining, by = c("runtime", "Title", "vote_average", "vote_count", "year")
# full join of metadata and budget data to get new and old movies etc
full_bud <- full_budget %>%
mutate(budget = pmax(new_budget, meta_budget, na.rm = T)) %>%
select(Title,genres, budget,new_budget, meta_budget, popularity,year,
release_date, revenue, runtime, vote_average,
vote_count,gross)
# set budget to max of two different budgets.
# picking max is arbitrary, but needed in most cases
full_bud <- full_bud[-c(1,2,3),] %>%
arrange(desc(as.numeric(budget)))
# remove first three unnecessary rows
# format(date, format="%Y")
full_bud <- full_bud %>%
mutate(year = replace(year, year == 1900, 2022))
adj_bud <- full_join(full_bud, adj,
on = c('Title', 'year')) %>%
mutate(with_inflation = (as.numeric(budget) * (multiplier))) %>%
mutate(gross_inflation = (as.numeric(full_bud$gross)
* (multiplier))) %>%
select(Title,genres,vote_average, vote_count, budget,
with_inflation, gross_inflation, year, gross,
release_date) %>%
arrange(desc(with_inflation)) ## Joining, by = "year"
## Joining, by = c("Title", "year")
The data I was able to collect contains 4,834 movies nominated for an Academy Award since its inception in 1929. Of these 4,834 movies. 1,274 won at least one award. There have been 13,312 total Oscar nominations, and 1,274 total Oscar wins in the dataset. 559 movies have been nominated for Best Picture in its many forms. Out of these, 92 won. 1,154 movies have had an actor or actress nominated in either a leading or supporting role. 313 movies had at least one winner in an acting category.
Title | budget | gross | pct_prof |
Paranormal Activity | 15,000 | 193,355,800 | 1,289,038.667 |
The Blair Witch Project | 60,000 | 248,639,099 | 414,398.498 |
The Gallows | 100,000 | 42,964,410 | 42,964.410 |
El Mariachi | 7,000 | 2,040,920 | 29,156.000 |
Once | 150,000 | 20,936,722 | 13,957.815 |
Clerks | 27,000 | 3,151,130 | 11,670.852 |
Napoleon Dynamite | 400,000 | 46,138,887 | 11,534.722 |
In the Company of Men | 25,000 | 2,804,473 | 11,217.892 |
Keeping Mum | 169,000 | 18,586,834 | 10,998.127 |
Open Water | 500,000 | 54,683,487 | 10,936.697 |
The Devil Inside | 1,000,000 | 101,758,490 | 10,175.849 |
The Quiet Ones | 200,000 | 17,835,162 | 8,917.581 |
Saw | 1,200,000 | 103,911,669 | 8,659.306 |
Searching | 880,000 | 75,462,037 | 8,575.231 |
Primer | 7,000 | 545,436 | 7,791.943 |
E.T. the Extra-Terrestrial | 10,500,000 | 792,910,554 | 7,551.529 |
My Big Fat Greek Wedding | 5,000,000 | 368,744,044 | 7,374.881 |
The Full Monty | 3,500,000 | 257,938,649 | 7,369.676 |
The Full Monty | 3,500,000 | 257,938,649 | 7,369.676 |
The Full Monty | 3,500,000 | 257,938,649 | 7,369.676 |
The Full Monty | 3,500,000 | 257,938,649 | 7,369.676 |
Friday the 13th | 550,000 | 39,754,601 | 7,228.109 |
Fireproof | 500,000 | 33,473,297 | 6,694.659 |
Insidious | 1,500,000 | 99,557,032 | 6,637.135 |
Unfriended | 1,000,000 | 62,882,090 | 6,288.209 |
Paranormal Activity 2 | 3,000,000 | 177,512,032 | 5,917.068 |
Get Out | 4,500,000 | 255,589,157 | 5,679.759 |
Get Out | 4,500,000 | 255,589,157 | 5,679.759 |
Get Out | 4,500,000 | 255,589,157 | 5,679.759 |
Get Out | 4,500,000 | 255,589,157 | 5,679.759 |
Four Weddings and a Funeral | 4,400,000 | 245,700,832 | 5,584.110 |
Four Weddings and a Funeral | 4,400,000 | 245,700,832 | 5,584.110 |
Pi | 60,000 | 3,221,152 | 5,368.587 |
Slacker | 23,000 | 1,228,108 | 5,339.600 |
Hollywood Shuffle | 100,000 | 5,228,617 | 5,228.617 |
The Breakfast Club | 1,000,000 | 51,525,171 | 5,152.517 |
Taxi 3 | 1,300,000 | 65,497,208 | 5,038.247 |
Valley Girl | 350,000 | 17,343,596 | 4,955.313 |
Chasing Amy | 250,000 | 12,021,272 | 4,808.509 |
Clifford's Really Big Movie | 70,000 | 3,255,426 | 4,650.609 |
A Separation | 500,000 | 22,926,076 | 4,585.215 |
A Separation | 500,000 | 22,926,076 | 4,585.215 |
Porky's | 2,500,000 | 111,289,673 | 4,451.587 |
The Brothers McMullen | 238,000 | 10,426,506 | 4,380.885 |
She's Gotta Have It | 175,000 | 7,137,502 | 4,078.573 |
Annabelle | 6,500,000 | 257,579,282 | 3,962.758 |
Look Who's Talking | 7,500,000 | 296,999,813 | 3,959.998 |
The Lives of Others | 2,000,000 | 77,356,942 | 3,867.847 |
The Last Exorcism | 1,800,000 | 69,432,527 | 3,857.363 |
Chernobyl Diaries | 1,000,000 | 38,390,020 | 3,839.002 |
Title | budget | gross | pct_prof |
Trojan War | 15,000,000 | 309 | 0.002060000 |
Madadayo | 11,900,000 | 596 | 0.005008403 |
Ginger Snaps | 5,000,000 | 2,554 | 0.051080000 |
Philadelphia Experiment II | 5,000,000 | 2,970 | 0.059400000 |
The Lovers on the Bridge | 28,000,000 | 29,679 | 0.105996429 |
Savior | 10,000,000 | 14,328 | 0.143280000 |
Tanner Hall | 3,000,000 | 5,073 | 0.169100000 |
Crimewave | 3,000,000 | 5,101 | 0.170033333 |
Deadfall | 10,000,000 | 18,369 | 0.183690000 |
Hell's Kitchen | 6,000,000 | 11,710 | 0.195166667 |
Barefoot | 6,000,000 | 15,071 | 0.251183333 |
Freaked | 11,000,000 | 29,296 | 0.266327273 |
Parasite | 800,000 | 2,270 | 0.283750000 |
Passion Play | 8,000,000 | 25,603 | 0.320037500 |
About Cherry | 2,500,000 | 8,315 | 0.332600000 |
Rock & Rule | 8,000,000 | 30,379 | 0.379737500 |
Best Laid Plans | 7,000,000 | 27,816 | 0.397371429 |
Brenda Starr | 16,000,000 | 67,878 | 0.424237500 |
O.C. and Stiggs | 7,000,000 | 29,815 | 0.425928571 |
My Summer Story | 15,000,000 | 70,936 | 0.472906667 |
The Boondock Saints | 6,000,000 | 30,471 | 0.507850000 |
Vamps | 16,000,000 | 92,748 | 0.579675000 |
Love Ranch | 25,000,000 | 146,149 | 0.584596000 |
Arizona Dream | 19,000,000 | 112,547 | 0.592352632 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
The Irishman | 159,000,000 | 968,853 | 0.609341509 |
Pulse | 6,000,000 | 40,397 | 0.673283333 |
Smooth Talk | 2,400,000 | 16,785 | 0.699375000 |
Dominion | 30,000,000 | 251,495 | 0.838316667 |
Surfer, Dude | 6,000,000 | 52,132 | 0.868866667 |
Postal | 15,000,000 | 146,741 | 0.978273333 |
Crackers | 12,000,000 | 129,268 | 1.077233333 |
Bloodhounds of Broadway | 4,000,000 | 43,671 | 1.091775000 |
The Last Time I Committed Suicide | 4,000,000 | 46,362 | 1.159050000 |
Phobia | 5,100,000 | 59,167 | 1.160137255 |
There Goes My Baby | 10,500,000 | 123,509 | 1.176276190 |
Gentlemen Broncos | 10,000,000 | 118,492 | 1.184920000 |
Animal Factory | 3,600,000 | 43,805 | 1.216805556 |
Underground | 14,000,000 | 171,082 | 1.222014286 |
Revolution | 28,000,000 | 358,574 | 1.280621429 |
Fandango | 7,000,000 | 91,666 | 1.309514286 |
The Million Dollar Hotel | 8,000,000 | 105,983 | 1.324787500 |
The minimum budget for a Best Picture winner with inflation was Marty with a cost of $3,674,769.95. The average budget for a Best Picture winner with inflation was$51,335,637.82. The maximum budget for a Best Picture winner with inflation was Titanic with a cost of $358,241,758.24.